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We study mentorship in scientific collaborations, where a junior scientist is supported by 
potentially multiple senior collaborators, without them necessarily having formal supervisory 
roles. We identify 3 million mentor-protege pairs and survey a random sample, verifying that 
their relationship involved some form of mentorship. We find that mentorship quality predicts 
the scientific impact of the papers written by proteges post mentorship without their men¬ 
tors. We also find that increasing the proportion of female mentors is associated not only 
with a reduction in post-mentorship impact of female proteges, but also a reduction in the 
gain of female mentors. While current diversity policies encourage same-gender mentorships 
to retain women in academia, our findings raise the possibility that opposite-gender men¬ 
torship may actually increase the impact of women who pursue a scientific career. These 
findings add a new perspective to the policy debate on how to best elevate the status of 
women in science. 
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M entorship contributes to the advancement of individual 
careersand provides continuity in organizations^’^. 
By mentoring novices, senior members pass on the 
organizational culture, best practices, and the inner workings of a 
profession. In this way, the mentor-protege relationship provides 
the social glue that links generations within a field. Mentorship 
can also alleviate the barriers of entry for underrepresented 
minorities, such as women and people of color by providing role 
models, access to informal networks and cultural capital, thereby 
acting as an equalizing force^~^®. Most workplaces have shifted 
from the classic master-apprentice model towards a team-based 
model, where the mentorship of juniors is distributed amongst 
the senior members of the team. As a result, it has become 
commonplace for juniors to be mentored by senior colleagues, 
without them necessarily being their formal supervisors^ h 12 
the context of academic collaboration, the role of mentorship in 
supporting early-career scientists is widely recognized^^. We 
analyze mentorship in this context, where a less experienced 
scientist is mentored by more experienced collaborators, without 
restricting our analysis to only the thesis advisor. 

Academic publications provide a documented record of mil¬ 
lions of collaborations spread over decades, and have already 
proven to be a fertile ground for exploring a wide variety of topics, 
including innovation^^, diversity^^, productivity^^, team 
assembly^ and individual successthereby giving rise to 
the field of Science of Science^^. We harness the potential of this 
rich dataset to study mentorship by analyzing academic colla¬ 
borations between junior and senior scientists, since such colla¬ 
borations play an important role in shaping the junior scientist’s 
persona, both in terms of their research focus^^, professional 
ethics, and work culture^^. Furthermore, we build on the 
expanding literature on gender equity and diversity in science^^”^® 
and analyze the mentorship experiences from the perspective of 
both female and male scientists. 

Compared to previous studies on mentorship in academia^ 
ours has the following advantages. First, instead of restricting our 
analysis to the thesis advisor, we study mentorship in its broader 
sense, which may involve multiple senior collaborators who may 
or may not hold a formal supervisory role. Second, we avoid 
sample selectivity as well as recall and recency biases, since we 
analyze the actual scientific impact of collaborations rather than 
self-reported information. Third, we analyze thousands of jour¬ 
nals spanning multiple scientific disciplines, rather than restrict¬ 
ing our focus to just a single one of them. Fourth, we construct 
careful comparisons between millions of mentor-protege pairs, 
allowing us to better understand the association between men¬ 
torship quality and scientific careers. Finally, our study comple¬ 
ments the literature on the relationship between mentorship and 
attrition from science^^, as we consider proteges who remain 
scientifically active after the completion of their mentorship 
period. 

It should be noted that we are not the first to study how the 
impact of junior scientists is related to the impact of their past 
collaborators. A recent study by Li et al.^^ found that juniors who 
publish with top scientists enjoy a persistent competitive advan¬ 
tage throughout the rest of their careers. More specifically, they 
focus on collaborators who are among the 5% most impactful 
scientists in any given year, regardless of whether they are senior 
or junior. In contrast, as we will show, our study focuses on 
collaborators who are likely to have served as mentors, regardless 
of whether they are among the top 5%. In other words, Li et al. 
study coauthorship with top scientists, while we study coau¬ 
thorship with mentors. Another difference between their study 
and ours is that they do not address the fundamental question of 
whether the social capital of collaborators matters more than their 
impact; we address this question by analyzing not only the 


mentors’ impact but also their collaboration network. Finally, 
unlike their paper, our study complements existing literature on 
women in science, by analyzing the gender of both the proteges 
and their mentors, and how these shape mentorship experiences. 

Another recent paper that is closely related to ours is the one 
by Ma et al.'^^ who study how the success of junior scientists is 
related to the ability of their mentors to create and communicate 
prizewinning research. As such, their work resembles ours in the 
sense that they also study some form of academic success and 
how it is related to mentorship. However, they study formal 
mentorship, where the mentor is the official PhD advisor of the 
protege. In contrast, our study covers informal mentorship 
whereby juniors are mentored by multiple senior colleagues 
without them necessarily having formal supervisory roles. Fur¬ 
thermore, their analysis of the protege’s performance post men¬ 
torship includes papers written with the mentors, leading to their 
finding that coauthoring with one’s advisor is inversely correlated 
with one’s success. In contrast, our analysis excludes papers 
written with any of the scientists who served as mentors during 
the mentorship experience; this ensures that the observed impact 
is not attributed to the mentors but rather to the proteges. 

Results 

Identifying mentor-protege pairs. We analyze 215 million sci¬ 
entists and 222 million papers taken from the Microsoft Aca¬ 
demic Graph (MAG) dataset^^, which contains detailed records 
of scientific publications and their citation network. We address 
the name disambiguation problem (see Supplementary Note 1), 
and we use other external data-generating techniques and sources 
to establish the gender of scientists and the rank of their affilia¬ 
tions (see “Methods” section and Supplementary Note 2). We 
distinguish between junior and senior scientists based on their 
academic age, measured by the number of years since their first 
publication. The junior years are those during which a scientist 
participates in graduate and postdoctoral training, and possibly 
the first few years of being a faculty member or researcher. In 
contrast, the senior years are those during which a scientist 
typically accumulates experience as a PI and transitions into a 
supervisory role. For any given scientist, we consider the first 7 
years of their career to be their junior years, and the ones after 
that to be their senior years. Whenever a junior scientist publishes 
a paper with a senior scientist, we consider the former to be a 
protege, and the latter to be a mentor, as long as they coauthored 
at least one paper with 20 or less co-authors and share the same 
discipline and US-based affiliation; see Supplementary Note 3 for 
more details. Our use sample consists of 3 million unique 
mentor-protege pairs, spanning ten disciplines (Biology, Chem¬ 
istry, Computer Science, Economics, Engineering, Geology, 
Materials Science, Medicine, Physics, and Psychology) and over a 
century of research; these disciplines contain over 97% of all pairs 
identified as per the criteria above. 

Survey results. While we acknowledge that it is possible for 
juniors to receive support from their junior collaborators, we 
interpret mentorship as the support that juniors receive from 
their senior collaborators, following the standard definition of 
mentorship as “the activity of giving a younger or less experi¬ 
enced person help and advice over a period of time” https:// 
dictionary.cambridge.org/dictionary/english/mentorship. Based 
on this definition, the difference in experience between the pro¬ 
tege and their mentor seems to be a necessary, albeit not suffi¬ 
cient, condition for the relationship to be considered mentorship. 
In addition to the difference in experience, the relationship also 
needs to involve some form of support from the mentor to the 
protege. Arguably, the fact that the mentor has coauthored a 


2 


NATURE COMMUNICATIONS | (2020)11:5855 | https://doi.org/10.1038/s41467-020-19723-8 | www.nature.com/naturecommunications 



NATURE COMMUNICATIONS | http5://doi.org/10.1038/s41467-020-19723-8 


ARTICLE 


a 


Distributions of the responses to the question: 

I received advice from him/her about... 

... writing 
... research study/design 

... data analysis/modeling 

... addressing reviewer 
comments 
... selecting a venue for 
publication 

100% 50% 0% 50% 100% 



^ Proportion of participants who have selected 

agree or strongly agree to at least 
Xstatements, where xe {1, ..., 5} 

Agree or strongly agree 
to at least 1 statement 
Agree or strongly agree 
to at least 2 statements 
Agree or strongly agree 
to at least 3 statements 
Agree or strongly agree 
to at least 4 statements 
Agree or strongly agree 
to all 5 statements 

0% 20% 40% 60% 80% 100% 



[ Strongly disagree 


■■ Disagree Agree Strongly agree 


Distributions of responses to the question: 

Which of these statements are true about your collaborator? 


I received grant writing advice 
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Fig. 1 Survey outcome. Responses of 167 randomly-chosen scientists who were identified as proteges and asked about their relationship to a scientist who 
was identified as one of their mentors, a Distributions of the responses to each of five statements regarding their senior collaborator, where the statements 
take the form "I received advice from him/her about..." followed by five different skills: (i) writing; (ii) research study/design; (iii) data analysis/modeling; 
(iv) addressing reviewer comments; (v) selecting a venue for publication, b A different way of summarizing the responses in a, showing the proportion of 
participants who either agree or strongly agree to at least x out of the five statements regarding their senior collaborator, where x g {1, ..., 5}. c The 
percentage of proteges who selected true for each of the following four statements regarding their senior collaborator: (i) I received grant writing advice 
from him/her; (ii) I received a letter of recommendation from him/her for a fellowship/award or job application; (iii) I received career planning advice from 
him/her; (iv) He/she put me in touch with an important person in my field, d A different way of summarizing the responses in c, showing the proportion of 
participants who have selected true to at least x out of the four statements regarding their senior collaborator, where x g {1,..., 4}. Source data are provided 
as a Source Data file. 


paper with the protege provides evidence that the former indeed 
supported the latter. Nevertheless, it would be desirable to pro¬ 
vide further evidence that the mentor supported the protege in 
ways related not only to the paper on which they are collabor¬ 
ating, but also to career development in general. To verify whe¬ 
ther this is the case, we sampled 2000 scientists whom we 
identified as proteges, to ask them about their relationship with 
their mentors. We manually extracted their emails from publicly 
available sources, such as their personal web pages, and invited 
them to fill out a survey about scientific collaborations. Out of 
those 2000 scientists, 167 completed the survey; see Supplemen¬ 
tary Note 4 for more details. A summary of the survey results is 
provided in Fig. 1. More specifically. Fig. la presents the dis¬ 
tribution of the responses to five questions, each asking whether 
the protege has received advice from the mentor about a different 
career-building skill. As can be seen, for each skill, a high per¬ 
centage of proteges agreed (strongly or otherwise) that they have 
received advice from the senior collaborator about that skill, with 
the percentage ranging from 72 to 85% depending on the skill. 

Figure lb summarizes the responses differently, by presenting 
the percentage of proteges who agreed (strongly or otherwise) to 
at least x out of the five skills, where x ranges from 1 to 5. As can 
be seen, 95% agreed (strongly or otherwise) that they have 
received advice from their senior collaborator regarding at least 
one skill. Figure Ic, d summarize the responses to a different set 
of questions, focusing on the support that the protege has 
received from the senior collaborator regarding different aspects 
of career development, outside the context of their joint 
publication. We find that almost 80% have stated that they have 
received advice from their senior collaborator regarding at least 
one of those aspects. Similar trends were observed when 


considering only the proteges who stated that the identified 
mentor was not their thesis advisor nor a member of their thesis 
committee; see Supplementary Fig. 1. Broadly similar trends were 
also observed when considering each discipline in isolation; see 
Supplementary Figs. 2-5. Altogether, these findings indicate that 
the relationship between our identified proteges and mentors 
indeed involved some form of mentorship. 

Analyzing mentor-protege pairs. When analyzing all our 
mentor-protege pairs, we consider two alternative measures of 
mentorship quality. The first is the average impact of the mentors 
prior to mentorship, where the prior impact of each mentor is 
computed as their average number of citations per annum up to 
the year of their first publication with the protege. This reflects 
the success of mentors and their standing and reputation in their 
respective scientific communities. We refer to this measure as the 
big-shot experience, as it captures how much of a “big-shot” the 
mentors of the protege are. The second measure of mentorship 
quality that we consider is the average degree of the mentors prior 
to mentorship, where the degree of each mentor is calculated in 
the network of scientific collaborations up to the year of their first 
publication with the protege^^’^^. We refer to this measure as the 
hub experience, as it reflects how much of a “hub” each mentor is 
in the collaboration network. These two measures of mentorship 
experience take the role of independent variables in our study. 

Having discussed our measures of mentorship quality, we now 
discuss the mentorship outcome, which we conceptualize as the 
scientific impact of the protege during their senior years without 
their mentors. We measure this outcome by calculating the 
average impact of all the papers that satisfy the following two 
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conditions: (i) they were published when the academic age of the 
protege was greater than 7 years; (ii) the authors include the 
protege but none of the scientists who were identified as their 
mentors. The impact of each such paper is calculated as the 
number of citations that it accumulated 5 years post publication, 
denoted by this is the measure of scientific impact that will 
be used throughout the article. Such an outcome measure allows 
us to assess the quality of the scholar that the protege has become 
after the mentorship period has concluded. 

We aim to establish whether mentorship quality (measured by 
big-shot experience or network experience) is associated with the 
post-mentorship outcome. To this end, we use coarsened exact 
matching (CEM)^^. While this technique does not establish the 
existence of a causal effect, it is commonly used to infer causality 
from observational data. Intuitively, CEM allows us to select a 
group of proteges who received a certain level of mentorship 
quality (treatment group), and match it to another group of 
proteges who received a lower level of mentorship quality 
(control group). Comparing the outcome of the two groups 
allows us to determine whether an increase in mentorship quality 
is indeed associated with an increase in the impact of the protege 
post mentorship. In more detail, for each measure of mentorship 
quality, we create a separate CEM where the treatment and 
control groups differ in terms of that measure, but resemble each 
other in terms of an array of characteristics of the proteges, in 
particular, the number of mentors they have, the year in which 
they published their first mentored paper, their scientific 
discipline, their gender, the rank of the affiliation listed on their 
first mentored publication (which is likely to be their PhD 
granting institution), the number of years active post mentorship, 
and the average academic age of their mentors, which is measured 
by first computing the academic age of each mentor in the year of 
their first publication with the protege, and then averaging these 
numbers over all the mentors. Importantly, when studying the 
big-shot experience, we make sure that the two groups are also 
similar in terms of the hub experience, and vice versa. 

Eor every independent variable, be it big-shot experience or 
hub experience, let Qi denote the /th quintile of the distribution of 
that variable. Then, for i G {1, 2, 3, 4}, we build a separate CEM 
where the treatment and control groups are Q/+i and Q/, 


respectively. The CEM results are depicted in Eig. 2. These results 
indicate that an increase in big-shot experience is significantly 
associated with an increase in the post-mentorship impact of 
proteges by up to 35%. Similarly, the hub experience is associated 
with an increase the post-mentorship impact of proteges, 
although the increase never exceeds 13%. Eurthermore, our 
analysis in Supplementary Note 5.3 and Supplementary Eigs. 6, 7 
suggests that these observations are not driven by differences in 
the proteges’ innate ability. 

Next, we compare the big-shot experience to the hub 
experience. As can be seen in Eig. 2, the mentorship outcome 
seems to be much more strongly associated with big-shot 
experience than with the hub experience. Supplementary Eigs. 8- 
12 as well as Supplementary Tables 8-17 show similar trends 
when (i) replacing with Ciq as per Sinatra et al.^^; (ii) 
computing our measures of mentorship quality using the 
maximum and median values instead of the average value; (iii) 
considering juniors and seniors to be those whose academic age is 
at most 6 and at least 9, respectively; and (iv) considering juniors 
and seniors to be those whose academic age is at most 5 and at 
least 10, respectively. Similar trends would also be observed if we 
replace the average with the sum in our measures of mentorship 
quality, since we are controlling for the number of mentors; see 
Supplementary Note 5.1 for more details. These findings imply 
that the scientific impact of the mentors matters more than their 
number of collaborators. Consequently, we restrict our attention 
to the big-shot experience throughout the remainder of our study. 
Supplementary Eigs. 13-18 as well as Supplementary Tables 18- 
23 suggest that the association between big-shot experience and 
mentorship outcome persists regardless of the discipline, the 
affiliation rank, the number of mentors, the average age of the 
mentors, the protege’s gender, and the protege’s first year of 
publication. 


The relationship between gender and mentorship. Next, we 
turn to a different exploratory analysis where we investigate the 
post-mentorship impact of proteges while taking into con¬ 
sideration their gender as well as the gender of their mentors. To 
this end, let Fi denote the set of proteges that have exactly i female 



Treatment vs. control 
Big-shot effect Hub effect 

Fig. 2 The big-shot experience and hub experience of 3 million mentor-protege pairs. For every independent variable, be it big-shot experience or hub 
experience, Q,- denotes the /th quintile of the distribution of that variable. For / g {1, 2, 3, 4}, we consider Q/+t and Q,- to be the treatment and control groups, 
respectively, and write Q/+i vs. Q,- when referring to the CEM used to compare these two groups. The color of the bar indicates whether the independent 
variable is the big-shot experience (purple) or the hub experience (yellow), whereas the height of the bar equals d, which is the increase in the average 
post-mentorship impact of the treatment group relative to that of the control group. A Mest shows that the values of 5 are all statistically significant; see 
the corresponding p-values in Supplementary Tables 2 and 3. Since scientific impact is sensitive to external values, we bootstrap a 95% confidence interval. 
The error bars represent the 95% confidence interval. < 0.001, **p < 0.01, *p < 0.05. Source data are provided as a Source Data file. 
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Fig. 3 The relationship between gender and the gain from mentorship, a F,- denotes the set of proteges from our 3 million pairs that have exactly / female 
mentors. Focusing on male proteges, F,- vs. Fq: / = 1,..., 5 refers to the change in the average post-mentorship impact of proteges in F,- relative to the average 
post-mentorship impact of those in Fq while controlling for the protege's big-shot experience, number of mentors, discipline, affiliation rank, and the year in 
which they published their first mentored paper. A t-test is used to show the that values are all satistically significant; see the corresponding p-values in 
Supplementary Table 24. b The same as a but for female proteges instead of male proteges, c The gain of a mentor when mentoring a particular protege is 
measured as the average impact ((cs)) of the papers they authored with that protege during the mentorship period. While controlling for the protege's 
discipline, affiliation rank, number of mentors, and the year in which they published their first mentored paper, the figure depicts the change in the mentor's 
average gain when mentoring a female protege relative to that when mentoring a male protege; results are presented for female mentors and male mentors 
separately. A t-test shows that the values are all statistically significant. Since scientific impact is sensitive to external values, we bootstrap a 95% 
confidence interval. The error bars represent the 95% confidence interval. *p < 0.05, **p < 0.01, < 0.001. Source data are provided as a Source 

Data file. 


mentors. We take the proteges in Fq as our baseline, and match 
them to those in Ff for / G {1, 2, 3, 4, 5}, while controlling for the 
protege’s average big-shot experience, number of mentors, gen¬ 
der, discipline, affiliation rank, and the year in which they pub¬ 
lished their first mentored paper. Then, we vary the fraction of 
female mentors to understand how this affects the protege. More 
specifically, for any given i > 0, we compute the change in the 
post-mentorship impact of the proteges in F^ relative to the post¬ 
mentorship impact of those in Fq, which we refer to by writing Ff 
vs. Fq. The outcomes of these comparisons are depicted for male 
proteges in Fig. 3a, and for female proteges in Fig. 3b. As shown 
in this figure, having more female mentors is associated with a 
decrease in the mentorship outcome, and this decrease can reach 
as high as 35%, depending on the number of mentors and the 
proportion of female mentors. 

So far in our analysis, we only considered the outcome of the 
proteges. However, mentors have also been shown to benefit from 
the mentorship experienced With this in mind, we measure the 
gain of a mentor from a particular protege as the average impact, 
(cs), of the papers they authored with that protege during the 
mentorship period. We compare the average gain of a female 
mentor, F, against that of a male mentor, M, when mentoring 


either a female protege,/, or a male protege, m. More specifically, 
we compare mentor-protege relationships of the type (f, F) to 
those of the type (m, F), where / and m are matched based on 
their discipline, affiliation rank, number of mentors, and the year 
in which they published their first mentored paper. Similarly, we 
compare relationships of the type (f, M) to those of the type 
{m, M), where/and m are matched as above. The results of these 
comparisons are presented in Fig. 3c. In particular, the figure 
depicts the gain from mentoring a female protege relative to that 
of mentoring a male protege; the results are presented for female 
mentors and male mentors, separately. These results suggest that, 
by mentoring female instead of male proteges, the female mentors 
compromise their gain from mentorship, and suffer on average a 
loss of 18% in citations on their mentored papers. As for male 
mentors, their gain does not appear to be significantly affected by 
taking female instead of male proteges. 

Discussion 

In this paper, we studied mentorship in academic collaborations, 
where junior scientists receive support from potentially multiple 
senior collaborators without necessarily having a formal 
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supervisory role. We identified 3 million mentor-protege pairs, 
and conducted a survey with a random sample of proteges, the 
outcome of which provided evidence that the relationship 
between them and their identified mentors involved some form of 
mentorship. Furthermore, having conceptualized mentorship 
quality in two ways—the big-shot experience and the hub 
experience—we found that both have an independent association 
with the protege’s impact post mentorship without their mentors. 
Interestingly, the big-shot experience seems to matter more than 
the hub experience, implying that the scientific impact of mentors 
matters more than the number of their collaborators. Our ana¬ 
lysis also suggests that the association between the big-shot 
experience and the post-mentorship outcome persists regardless 
of the discipline, the affiliation rank, the number of mentors, the 
average age of the mentors, the protege’s gender, and the pro¬ 
tege’s first year of publication. Finally, we studied the possibility 
that the gender of both the mentors and their protege could 
predict not only the impact of the protege, but also the gain of the 
mentors, which we measure by the citations of the papers they 
published with the protege during the mentorship period. Future 
research could investigate the mechanisms that underlie our 
findings, e.g., (i) by comparing mentors who are newcomers to 
those who are incumbents (ii) by analyzing the papers that cite 
the proteges to see how many of those are authored by the 
mentors’ collaborators, and (iii) by studying the topics that the 
proteges work on during, and after, the mentorship to understand 
the skills that are transferred from the mentors to their proteges. 
These would be welcome extensions to the study, but remain 
outside of its current scope. 

While it has been shown that having female mentors increases 
the likelihood of female proteges staying in academia^® and 
provides them with better career outcomes^^, such studies often 
compare proteges that have a female mentor to those who do not 
have a mentor at all, rather than to those who have a male 
mentor. Our study fills this gap, and suggests that female proteges 
who remain in academia reap more benefits when mentored by 
males rather than equally-impactful females. The specific drivers 
underlying this empirical fact could be multifold, such as female 
mentors serving on more committees, thereby reducing the time 
they are able to invest in their proteges^^, or women taking on 
less recognized topics that their proteges emulate^^”^®, but these 
potential drivers are out of the scope of current study. Our 
findings also suggest that mentors benefit more when working 
with male proteges rather than working with comparable female 
proteges, especially if the mentor is female. These conclusions are 
all deduced from careful comparisons between proteges who 
published their first mentored paper in the same discipline, in the 
same cohort, and at the very same institution. Having said that, it 
should be noted that there are societal aspects that are not cap¬ 
tured by our observational data, and the specific mechanisms 
behind these findings are yet to be uncovered. One potential 
explanation could be that, historically, male scientists had enjoyed 
more privileges and access to resources than their female coun¬ 
terparts, and thus were able to provide more support to their 
proteges. Alternatively, these findings may be attributed to sorting 
mechanisms within programs based on the quality of proteges 
and the gender of mentors. 

Our gender-related findings suggest that current diversity 
policies promoting female-female mentorships, as well-intended 
as they may be, could hinder the careers of women who remain in 
academia in unexpected ways. Female scientists, in fact, may 
benefit from opposite-gender mentorships in terms of their 
publication potential and impact throughout their post¬ 
mentorship careers. Policy makers should thus revisit first and 
second order consequences of diversity policies while focusing 


not only on retaining women in science, but also on maximizing 
their long-term scientific impact. More broadly, the goal of 
gender equity in science, regardless of the objective targeted, 
cannot, and should not be shouldered by senior female scientists 
alone, rather, it should be embraced by the scientific community 
as a whole. 

Methods 

Data description. The data used for this study consists of all the papers included in 
the Microsoft Academic Graph (MAG) dataset up to December 31st, 2019^^’^f 
This dataset includes records of scientific publications specifying the date of the 
publication, the authors’ names and affiliations, and the publication venue. It also 
contains a citation network in which every node represents a paper and every 
directed edge represents a citation. While the number of citations of any given 
paper is not provided explicitly, it can be calculated from the citation network in 
any given year. Additionally, every paper is positioned in a field-of-study hierarchy, 
the highest level of which is comprised of 19 scientific disciplines. 

Using the information provided in the MAG dataset, we derive two key 
measures: the discipline of scientists and their impact. In particular, to determine 
the discipline of any given scientist, we consider his or her publications, which are 
themselves classified into disciplines by MAG. If 50% or more of those papers were 
from the same discipline, di, then the scientist’s discipline is considered to be dii 
otherwise it is considered to be unclassified. As for the impact of each scientist in 
any given year, it was derived from the citation network provided by MAG. In 
addition to the scientists’ discipline and impact, we derive additional measures such 
as the scientists’ gender, which is determined using Genderize.io^^ (see 
Supplementary Note 2), and the rank of each university, which is determined based 
on the Academic Ranking of World Universities, also known as the Shanghai 
ranking http://www.shanghairanking.com/ARWU2018.html. 

Whenever a junior scientist (with academic age < 7) publishes a paper with a 
senior scientist (academic age > 7), we consider the former to be a protege, and the 
latter to be a mentor. We consider the start of the mentorship period to be the year 
of the first publication of the protege, and consider the end of the mentorship 
period to be the year in which the protege became a senior scientist. We analyze 
every mentor-protege dyad that satisfies all of the following conditions: (i) the 
protege has at least one publication during their senior years without a mentor; (ii) 
the affiliation of the protege is in the US throughout their mentorship years; (iii) 
the main discipline of the mentor is the same as that of the protege; (iv) the mentor 
and the protege share an affiliation on at least one publication; (v) during the 
mentorship period, the mentor and the protege worked together on a paper whose 
number of authors is 20 or less; and (vi) the protege does not have a gap of 5-years 
or more in their publication history. As a consequence, our analysis excludes all 
scientists: (i) who never published any papers without their mentors post¬ 
mentorship, as we cannot analyze their scientific impact in their senior years 
independent of their mentors; (ii) who only had solo-authored papers or 
collaborations with their junior peers or with seniors from other universities, as we 
cannot clearly establish who their mentors were; (iii) who had a gap longer than 5- 
years without any publications; and (iv) who only collaborated with senior 
scientists outside of their main discipline. 

As our use sample we consider the ten disciplines in MAG that have the largest 
number of mentor-protege pairs, namely Biology, Ghemistry, Computer Science, 
Economics, Engineering, Geology, Materials Science, Medicine, Physics, and 
Psychology. These disciplines contain over 97% of all pairs identified as per the 
criteria above; see Supplementary Table 1. 

A total of 204 different Coarsened Exact Matchings (CEMs) were used to 
produce the results depicted in Eig. 2 and Supplementary Eigs. 6-18. Additionally, 
a total of 32 different matchings were used to produce the results depicted in Eig. 3. 
More details about the confounding factors used therein, as well as the binning 
decisions, can all be found in the Supplementary Note 5.1. 

Ethics statement. The survey portion of the study was approved by the NYUAD 
Institutional Review Board, #HRPP-2020-8. Informed consent was obtained from 
all of the participants, who also received incentives. 

Reporting summary. Further information on research design is available in the Nature 
Research Reporting Summary linked to this article. 

Data availability 

Microsoft Academic Graph (MAG) can be downloaded from https://bit.ly/3kPaUqe. All 
data generated from MAG for the purpose of this study is made available at https://bit.ly/ 
3cHJJuC. A reporting summary for this Article is available as a Supplementary 
Information file. Source data are provided with this paper. 
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